Simplex projection made simple
School on Physics Applications in Biology, January 2018
The first part of this tutorial introduces the rationale of Simplex Projection, which is one of the building blocks of Empirical Dynamics Modelling (EDM). In the second part we show how to run simplex projection with the rEDM R package (Ye et al. 2017) to forecast values in a time series. We show also how the forecast skill of simplex projection reveals important properties of the dynamics behind a time series. Hence, you will need the most recent version of R and the rEDM package installed in your working computer.
Before you proceed, please read carefully the introduction and the first section (“Empirical Dynamic Modelling”) of rEDM’s tutorial that you can find here or in your local R installation with the command
Simplex projection: a visual walthrough 1
The shadow manifold
We will start by simulating a deterministic discrete-time dynamics with chaotic behavior. To create a time series from this dynamics with 150 time steps run the commands below in the R console:
## Two vectors to store data
X <- c()
Y <- c()
## Initial values
X[1] <- 0.1
Y[1] <- 0.3
X[2] <- 0.3
Y[2] <- 3.78*Y[1] - 3.78*Y[1]^2
## Iterate the dynamics 150 time steps
for(i in 3:150){
X[i] <- 3.77*X[i-1] - 3.77*X[i-1]^2 - 0.85*Y[i-1]*X[i-1] - 0.5*X[i-2]
Y[i] <- 3.78*Y[i-1] - 3.78*Y[i-1]^2
}If all information you have about this system is the time series of variable X the dynamics looks pretty messy:
Taken’s Theorem provides a simple way to recover the rules that govern a dynamical system by building a lagged cordinate plot. Below you can play with such a plot for the time series of X. Each point in this 3-D plot depicts the state of the system at a given time t, and in the next two time steps (t+1 and t+2). The points form a clear pattern, which means that the possible states of the system are constrained to a certain subset of the plotting space, with a well defined shape. Please rotate the plot to check this.
We call the plot space an embedding and the shape it reveals a shadow manifold of the true attractor of the dynamical system. For now let’s assume that the embbedding depicted below satisfies Taken’s Theorem 2, and thus nearby points in the embedding space correspond to similar system states.
Now suppose we want to predict the last point of the time series (t = 150). The blue point in the lagged plot below is X(t-3, t-2, t-1), which is the state of the system immediately before t = 150. The first, second, third and fourth nearest neighbors of the blue point are depicted in red,green, orange, and magenta, respectively. If you zoom in the points you will see segments linking each neighbor to the focal point. The length of these segments are Euclidian distances. You can also pan and rotate the plot to get a better idea of the spatial pattern of these points.
The neigbor points correspond to the states of the system most similar to the focal state that are available in the time series. The plot below shows each of these states, with the same code color used in the embbeding plot above.
Simplex forecasting
And here is the trick: because the neighbor points in the shadow manifold are trajectories that match the focal one, they provide a good guess of what will come next. To show this in the plot below we moved a time step forward in each trajectory that corresponds to the neighbor points in the shadow manifold. The arrows project the resulting values of X to t=150, and the black triangle is an average of these projected points, which is the forecasted value of X for this time step. The actual value of X(t=150) is the last point in the series. Not bad!
Simplex projection is therefore a forecast for a given state based on the average behavior of similar states of the system. Such average is weighted by the distance of each neighbor to the focal point in the shadow manifold - that is, closer points have more weight because they match better the focal trajectory.
The plot below show the focal and neighbor points at their original position and projected, that is, at their new positions when we move one time step forward. The black point which is linked to the projected neighbors is the forecasted state. The blue point close to the black one is the actual state. You can zoom, pan and rotate the figure to check that the distances of the projected neighbor points to the forecast are scaled to the distances of the original neighbors to the focal point.
The full pseudocode for simplex projection forecast is available here.
Simplex projection in practice
Before proceeding please read the subsections “Nearest Neighbor Forecasting using Simplex Projection” and “Prediction Decay” in the rEDM tutorial.
In this section we will use the rEDM package to forecast many points in the time series we created, and also to get some important information about the dynamics that generated this series. To start open R and load the package typing in the R console
The function simplex in the rEDM package runs simplex projections for a time series. The argument E sets the embedding dimensions (number of time lags in the lagged plot) and the argument stats_only sets if only statistics of the forecasts should be stored (or the forecatsed values too).
The command below forecasts one step foward each point in the series for an embedding dimension of three (as we used in the previous section), and asks to save the forecasted values:
The function returns a list with many elements, most of them statistics about the forecast:
## [1] "E" "tau" "tp"
## [4] "nn" "num_pred" "rho"
## [7] "mae" "rmse" "perc"
## [10] "p_val" "const_pred_num_pred" "const_pred_rho"
## [13] "const_pred_mae" "const_pred_rmse" "const_pred_perc"
## [16] "const_p_val" "model_output"
The last element of the list, called model_output, is a list of dataframes with the time, observed and predicted values and the variance of predicted values:
## dataframe with obs and predicted in a separated object
fits <- predE3$model_output[[1]]
head(fits)## time obs pred pred_var
## 1 4 0.5030702 0.3636169 8.193049e-03
## 2 5 0.2915131 0.3533368 2.930746e-03
## 3 6 0.4366629 0.4492918 1.441837e-05
## 4 7 0.4564548 0.4841492 2.783205e-04
## 5 8 0.5577898 0.5034417 5.362669e-03
## 6 9 0.2680250 0.2813136 4.391430e-04
The one-step foward forecast was very good for most of the points:
plot(pred ~ time, data = fits, type = "l", col = "blue", lwd=3,
xlab="Time", ylab="X", ylim=range(fits[,2:3]))
lines(obs ~ time, data = fits, col=grey.colors(1, alpha=0.25), lwd = 6)
legend("topright", c("Observed", "Predicted"), lty=1, lwd=c(6,3),
col=c(grey.colors(1, alpha=0.25), "blue"),bty="n")The Pearson linear correlation between observed and predicted values is a measure of forecast skill. This correlation is one of the statistics available in the object returned by the simplex function:
## [1] 0.9233967
The correlation is close to one, which indicates an excelent forecast skill.
Optimal embedding dimension
We can run simplex forecasts for different embedding dimensions simply providing multiple values to the argument E. The command below runs simplex projection forecasts using two, three and ten embedding dimensions. The observed and predicted values for each projection are then used to plot predicted x observed values:
predE2 <- simplex(time_series = X, E = c(2,3,10), stats_only = FALSE)
par(mfrow=c(1,3))
plot(pred ~ obs, data = predE2$model_output[[1]],
main = bquote("Embedding = 2, " ~ rho == .(round(predE2$rho[1],2))))
plot(pred ~ obs, data = predE2$model_output[[2]],
main = bquote("Embedding = 3, " ~ rho == .(round(predE2$rho[2],2))))
plot(pred ~ obs, data = predE2$model_output[[3]],
main = bquote("Embedding = 10, " ~ rho == .(round(predE2$rho[3],2))))
par(mfrow=c(1,1))Embeddings with too few or too many dimensions do not unfold the shadow manyfold properly and then worsens the forecast. Thus, we can use a plot of the forecast skill x embedding dimension to find out the optimal number of dimensions (which in the current case is three, as we guessed in the previous section):
find.emb <- simplex(time_series = X, E = 1:10)
plot(rho ~ E, data=find.emb, type="b",
xlab = "Embedding dimensions",
ylab = expression(paste("Forecast skill (",rho,")",sep="")))Prediction decay
So far we used simplex projection to forecast one time step forward. The commands below run forecasts for one to ten time steps ahead (argument tp = 1:10 in the function simplex) and plot the forecast skill in function of these times to prediction:
pred.decay <- simplex(time_series = X, E = 3, tp = 1:10)
plot(rho ~ tp, data=pred.decay,
type = "b",
xlab = "Time to prediction",
ylab = expression(paste("Forecast skill (",rho,")",sep="")))There is a sharp decline of the forecast skill when we increase the time to prediction. This is a characteristic feature of chaotic dynamics and thus this plot can be used to distinguish deterministic chaos from random noise.
Distiguishing error from chaos
We will appply the predcition-decay diagnostics to non-chaotic time series, starting with a pure deterministic simulation. The code below simulates 150 time steps of the logistic map at the verge of chaos:
X2 <- c()
X2[1] <- 0.5
for(i in 2:150)
X2[i] <- 3.569949 * X2[i-1] * ( 1- X2[i-1] )
## Plots the series
plot(X2, xlab="Time", ylab="X", type="b", lty=3) The forecast skill is very high and varies very little with the embedding dimension, so we will use an embedding dimension of two.
find.emb2 <- simplex(time_series = X2, E = 1:10)
plot(rho ~ E, data=find.emb2, type="b",
ylim=c(0,1),
xlab = "Embedding dimensions",
ylab = expression(paste("Forecast skill (",rho,")",sep="")))And there is no decay in the forecast skill with the time to prediction, which is not suprising for a periodic deterministic time series:
Things can get more interesting if we add some noise to the time series. The commands below simulate that the time series has independent measurement errors with constant variance. To do that we add to each observation a value drawn from a Gaussian distribution with zero mean and standard deviation equal to those of the time series itself.
## Adding noise
X3 <- X2 + rnorm(n = length(X2), mean = 0, sd = sd(X2))
## Plot series
plot(X3, xlab="Time", ylab="X", type="b", lty=3) Now the series looks chaotic. Or should we say ‘only noisy’?
The prediction decay plot shows an overall decrease of forecast skill caused by the addition of measurement error, but there is no sign of prediction decay with forecast time:
pred.decay3 <- simplex(time_series = X3, E = 6, tp = 1:50)
plot(rho ~ tp, data=pred.decay3,
type = "l",
xlab = "Time to prediction",
ylab = expression(paste("Forecast skill (",rho,")",sep="")),
ylim = c(0,1))Problems
During the 1950’s the entomologist Alexander Nicholson carried out a series of detailed experiments with caged populations of the sheep blowfly in Australia. The results suggested a very complex, non-linear system and inspired many advances in the mathematical modelling of population dynamics. For a detailed historical and mathematical account see Brilinger, J Time Ser Analisys 2012.
In this exercise use the tools provided in this tutorial to investigate how complex is the dynamics behind these time series. To download one of the few time series that were preserved, use the commands below:
nich97I <- read.csv("https://www.stat.berkeley.edu/~brill/blowfly97I")
plot(nich97I$total, type="b", xlab="Time", ylab="Total number of flies")Hints
1
The time series has some periodicity, which in some cases can be caused by seasonal factors. This makes EDM analysis much more complicated3. Nevertheless in some cases you can keep it simple by analysing the time series of the differentiated values:
2
S-mapping (Sugihara 1994) is another forecasting method that can be used for identify another aspect of complex dynamics, called state-dependent behavior. This method is available in rEDM (see the tutorial of the package).
Learn more
- Sugihara G. and R. M. May. 1990. Nonlinear forecasting as a way of distinguishing chaos from measurement error in time series. Nature 344:734–741.
- Sugihara G. 1994. Nonlinear forecasting for the classification of natural time series. Philosophical Transactions: Physical Sciences and Engineering 348:477–495.
- Anderson C. & Sugihara G. Simplex projection - Order out the chaos. http://deepeco.ucsd.edu/simplex/
- “Simplex projection walkthrough”, a tutorial by Owen Petchey.
This section is adapted from the tutorial “Simplex projection walkthrough”, by Owen Petchey.↩
The next section shows some compealling evidence for this.↩
For a full appraisal see: Deyle, E.R., Maher, M.C., Hernandez, R.D., Basu, S. and Sugihara, G., 2016. Global environmental drivers of influenza. Proceedings of the National Academy of Sciences, 113(46), pp.13081-13086.↩